Voice over IP security - SIP and RTP protocolsVoice over IP security - SIP and RTP protocols
Tobias Glemser, Reto Lorenz
Voice Over IP (VoIP) is one of the hottest buzzwords in
contemporary IT, even more so since the last CeBit in March 2005, and a
new hope for both service providers and device manufacturers. Countries
with good network infrastructure typically have several offers of VoIP
bundles, consisting of a hardware router with VoIP functionality and
attractive pricing for both Internet access and telephony. VoIP is set
to displace stationary telephony solutions sooner or later, but serious
security issues tend to go unnoticed in all the hype.
Today, VoIP technology is a common component of broadband
Internet access offers, with free calls between VoIP users within the
same provider and cheap all-inclusive offers for interfacing to classic
telephony systems serving to spur the popularity of this technology.
What's more, it is not only the SOHO (Small Office Home Office) users
who are embracing VoIP - larger companies also increasingly recognising
the technology's potential for communications infrastructure
consolidation. They can now connect branch offices with one fibre-optic
cable and use it to transmit both voice and data. Employees can always
be reached at the same phone numbers, regardless of where they
physically are, while the dual use of network infrastructure sharply
cuts the costs of purchasing, installing and maintaining active and
passive network components. As usual, problems only appear after a
system has been bought and deployed, as manufacturers are not too
forthcoming in this matter, preferring to push their brilliant
migration strategies and overvalued services instead.
One of these shortcomings received a lot of media
attention recently, when a thirteen year old girl died because the US
emergency call number (911) had not been routed in the VoIP network her
mother used. In most countries, legal regulations concerning the
routing of emergency calls in VoIP networks simply don't exist yet,
with the issue only being discussed since quite recently.
Besides organisational deficiencies, several attacks
against the VoIP technical infrastructure exist. Before approaching
them, we'll need to understand the basics of SIP (Session Initiation
Protocol) security. We will stick to SIP, as current trends clearly
indicate a migration away from H.323 and towards SIP.
The purpose of this article is not to introduce SIP itself (see Frame SIP - Simply bare necessities for some background information),
but rather to see how attacks against VoIP can be conducted and what
can be done to guard against them. The attacks described here target a
typical VoIP environment which uses SIP as the signalling protocol, and
are based on commonly used methods, as implementation-specific attack
methods are beyond the scope of this article.
SIP - Simply bare necessities
SIP packets contain initial call setup parameters. All
other parameters - such as RTP connection attributes - are sent using
the Session Description Protocol (SDP), which is embedded into SIP
messages as the message body. SIP packets can be divided into request
and response packets. Messages are encoded using the UTF-8 standard, so
they are directly readable if no other security measures are employed.
SIP messages are very similar to HTTP - Table 1 shows the
required header request fields. A glance at the protocol elements
reveals that the protocol definitions actually provide contextual
communication, even if data is sent using a stateless transport
protocol such as UDP.
Now we know the basic SIP components, let's have a look
at the literal request strings (see Table 2), corresponding to several
different request methods. SIP can be enhanced with new request
methods, so will only be referring to the basic ones (see the relevant
RFCs for specifications of other methods). The request methods and
their related request strings indicate that several types of attacks
can be conducted (a discussion of other response classes and their uses
is beyond the scope of this article).
Messages are integrated into the communication context.
The latter may contain two types of components: dialogues and
transactions, with each dialogue potentially including multiple transactions. For example, any VoIP call is an SIP dialogue consisting of the INVITE, ACK and BYE
transactions. User agents must be capable of storing dialogue status
for an extended period in order to generate messages with the correct
parameters.
The use of dialogues means that there are several other connection parameters besides Call-ID
- two of these are tag and branch. It must be noted that the
correspondence between context-specific values and user-agent behaviour
is not as clear-cut as other SIP definitions, which is one reason for
the existence of buggy, unreliable and insecure implementations.
After a call is successfully switched through an SIP
proxy, the actual voice communication proceeds using RTP. Using the
exchanged codes, voice messages are transferred between the
communicating parties (provided direct IP communication is possible),
and the SIP proxy is only needed for call release.
Table 1. SIP request header fields
|
Header
|
Description
|
|
Request-URI
|
Contains the method, the request URI and the SIP version used. The request URI is typically the same address as the To field (except for the REGISTER method).
|
|
To
|
Target
for the message and its associated method. The target is a logical
recipient, because it is not clear from the beginning whether the
message will reach the named recipient. Depending on the communication
context, a tag value may also be attached.
|
|
From
|
Logical identifier of the request sender. The From field has to contain a tag value, which is chosen by the client.
|
|
CSeq
|
Short
for Command Sequence. Used for checking the order of the message within
a transaction. Consists of an integer value and an identifier of the
request method.
|
|
Call-ID
|
Unique value assigned to identify all the messages within a dialogue. It should be established using cryptographic methods.
|
|
Max-Forwards
|
Used to avoid loop situations. If no external criteria exist for specifying a certain value, the value 70 should be given.
|
|
Via
|
Shows
the forwarding path and response target location. The field has to
contain a branch value, which is unique to a specific user agent. The Branch-ID always starts with z9hG4bK and uses the request to mark the beginning of a transaction.
|
Table 2. SIP request header methods
|
Method
|
Description
|
|
REGISTER
|
Method
for registering and deregistering a proxy client. Registering is
required to prepare for VoIP communication. Deregistering is done by
setting the period value to 0.
|
|
INVITE
|
The
most important method, and the reason we need SIP. All subsequent
methods are subordinate to it, even if they are used in isolation. INVITE is used to set up new calls.
|
|
ACK
|
Once a call (such as a video conference) is set up, readiness is acknowledged by sending a separate ACK request. A streaming connection immediately follows.
|
|
BYE
|
Used to end calls normally. Sending it terminates a transaction established using INVITE. A BYE message will not be processed without the appropriate dialogue parameter (Call-ID or tag).
|
|
CANCEL
|
Used for cancelling a connection before a call is established. Also used in error situations.
|
|
OPTIONS
|
Used to establish the supported request methods or the transmission media attribute.
|
|
NOTIFY
|
Additional
request method defined in RFC 3265, allowing a client to be notified of
the status of the resource they are connected to (for example receiving
notification of new voice messages).
|
SIP and family
Understanding VoIP communication requires a discussion of
several protocols used for setting up and ending a call. One of these
hashes the signal to divide it between the various communicating
parties for signalling, voice transfer or gateway messages. Unlike
traditional telephony, where - from a user's point of view -
communication requires only a single cable, VoIP involves split
communication paths. Here are the most important protocols:
These protocols provide core VoIP functionality and are
used in a growing number of implementations. Other protocols also
exist, but here will focus just on the ones listed above.
To appreciate how attacks can be approached, we will go
through the process of setting up a basic call, using just one SIP
proxy for all examples. The proxy is a part of the signalling and dial
switching infrastructure. In practice, there are usually two or more
switching SIP proxies, especially if the call participants are not
within the same network environment. If several proxies are used, they
also exchange SIP messages, which results in extra layers of
communication. Before we go into more detail, Figure 1 provides an
overview of the basic mechanism. The actual protocols contain no
ground-breaking features. SIP, for instance, uses some very typical
techniques, including elements of HTTP, while RTP was defined almost 10
years ago and last updated in 2003.
Figure 1. Overview of setting up a call using SIP
SIP/ARP attacks against VOIP
Several attack vectors exist, each requiring different
activity on the part of the attacker. We will look at seven of the most
popular, most effective and most widely discussed attacks, and see how
they can be used in practice.
The main reason for the vulnerability of VoIP when
compared to Plain Old Telephone Systems (POTS) is the use of a shared
medium. No dedicated line exists for call transactions, just a network
used by lots of users and lots of different applications. This makes it
much easier for an attacker to tap into communication - all he needs to
do is use a suitable computer.
Eavesdropping on telephone calls and replaying them in
front of the communicating parties is definitely one of the most
impressive attacks on VoIP. As outlined earlier, signalling is done via
an SIP proxy, while the actual communication between parties uses the
peer-to-peer model. In our scenario, we want to listen in on the
conversation between Alice and Bob. To achieve this, we should launch a
man in the middle (MITM) attack using ARP poisoning (see Frame ARP
poisoning attack) to convince the proxy and Alice and Bob's VoIP phones
that they actually want to communicate with us rather than each other.
ARP poisoning attack
The attacker poisons the ARP table of the systems to be
attacked. The purpose of the ARP table is to convert logical IP
addressing to actual physical addressing
in Layer 2 of the OSI reference model (Ethernet MAC addresses). Almost
every non-hardened operating system accepts unrequested ARP replies, so
the attacker first fills the ARP table with all the IP addresses he
wants to get between and then deposits his own MAC address for all
these IP addresses by sending such unrequested ARP replies. Each packet
received is duly forwarded to the original recipient, who is also being
poisoned. Communication is working, but the interception will not
recognized by the communicating parties if they don't use cryptographic
mechanisms like TLS/SSL.
Figure 2 presents an outline of VoIP transmission
sniffing. First, the call is set up. Alice sends the SIP proxy a
request to call Bob. The message is intercepted and forwarded by the
attacker. The SIP proxy now tries to reach Bob to tell him that Alice
wants to communicate with him - this message is intercepted and
forwarded, too. After successful call initialisation, the actual call
between Alice and Bob begins (using the RTP protocol), and this RTP
communication is also intercepted and forwarded by the attacker.
If you use a tool like Ethereal to sniff the
communication, you will also receive the RTP stream payload. To listen
to it, you can load the sniffed data into a voice decoder like the
Firebird DND-323 Analyzer or use Ethereal itself, provided the G.711
U-law (PCMU) or G.711 A-law (PCMA) codecs are used (these are the
international standards for coding and decoding telephony
transmissions).
Figure 2. VoIP sniffing
A very clever tool for performing both voice decoding and
ARP poisoning is called Cain & Abel (see Frame On the Net). Once
you have it up and running, you should check all existing hosts in your
subnet (using ARP requests) by clicking the plus
symbol. These hosts can now be seen under the tab Sniffer and can be
chosen as victims in the sub-tab ARP. For our attack, we will select
the IP addresses of Alice, Bob and the SIP proxy. After clicking the
Start/Stop ARP button, the ARP poisoning is initialized and the
attacker has only one thing left to do - sit and wait. The rest is done
by Cain & Abel (see Figure 3). If a call between Alice and Bob was
established and concluded, it will automatically be stored as a WAV
file and shown in the VoIP tab - you can listen to the conversation
using any audio player. By the way, if the communicating parties
happened to exchange some passwords in the meantime (POP3 for example),
the attacker might want to have a look at them using the Passwords tab.
Figure 3. Voice decoding with Cain & Abel
As you can see, if no additional security measures are
employed, an attacker within the local network can easily sniff the
communication and then simply listen to it.
Identity theft and registration hijacking
Registering with an SIP proxy is normally done by
submitting a username and password. As already mentioned, SIP messages
are unencrypted. If an attacker is sniffing the authentication process
(for example using ARP spoofing), he can use the username and password
combination to authenticate himself on the SIP proxy.
However, such attacks are no longer possible for
contemporary VoIP implementations. The authentication process (see
Frame Security measures within VoIP protocols) and other secured operations make use of digest
authentication. The client starts by attempting to authenticate with
the SIP proxy (see Listing 1). The proxy rejects the authentication
attempt by sending the status code 401 Unauthorized (Listing 2) and
returns a demand for the client to log on using digest authentication.
In the line beginning with WWW-Authenticate, a random nonce value is
provided.
Security measures within VoIP protocols
Apart from mechanisms for protecting contextual
communication, SIP features a number of other security measures (though
these are not obligatory for SIP implementations), dealing mainly with
authentication and cryptographic security of communication.
Several authentication methods are available. A common one is called digest authentication - a simple challenge-response mechanism which can be used for any request.
Another way of securing SIP packets is to use the
well-known S/MIME protocol, which allows the SIP message body to be
secured with S/MIME certificates. Using S/MIME assumes that a PKI and
the necessary certificate verification mechanisms are available. In
case of SIP, S/MIME is typically used to secure SDP messages, but using
it in practice can be arduous and time-consuming if the necessary
infrastructure is not in place.
Other security mechanisms require additional protocol
elements. For example, TLS can be used both for SIP and RTP, but in the
case of SIP the protection is only hop-by-hop, so it cannot be
automatically assumed that the other party is using a TLS enabled phone.
Listing 1. SIP registration phase 1 (client to SIP proxy)
REGISTER sip:sip.example.com SIP/2.0
Via: SIP/2.0/UDP 10.10.10.1:5060;rport;
branch=z9hG4bKBA66B9816CE44C848BC1DEDF0C52F1FD
From: Tobias Glemser <sip:123456@sip.example.com>;tag=1304509056
To: Tobias Glemser <sip:123456@sip.example.com>
Contact: "Tobias Glemser" <sip:123456@10.10.10.1:5060>
Call-ID: 2FB73E1760144FC0978876D9D69AE254@sip.example.com
CSeq: 20187 REGISTER
Expires: 1800
Max-Forwards: 70
User-Agent: X-Lite
Content-Length: 0
Listing 2. SIP registration phase 2 (proxy to client) - rejection
SIP/2.0 401 Unauthorized
Via: SIP/2.0/UDP 10.10.10.1:5060;rport=58949;
branch=z9hG4bKBA66B9816CE44C848BC1DEDF0C52F1FD
From: Tobias Glemser <sip:123456@sip.example.com>;tag=1304509056
To: Tobias Glemser <sip:123456@sip.example.com>;
tag=b11cb9bb270104b49a99a995b8c68544.a415
Call-ID: 2FB73E1760144FC0978876D9D69AE254@sip.example.com
CSeq: 20187 REGISTER
WWW-Authenticate: Digest realm="sip.example.com",
nonce="42b17a71cf370bb10e0e2b42dec314e65fd2c2c0"
Server: sip.example.com ser
Content-Length: 0
In the third step (see Listing 3), the client
re-authenticates, this time also sending a WWW-Authenticate message
containing the username, the appropriate realm and the nonce value
previously sent by the server. The most important part is the response
value, which is usually an MD5 hash generated from the username,
password, the nonce sent by the server, the HTTP method and the request
URI. The message is processed by the server, which builds its own MD5
hash from the same data. If the two hashes are equal, authentication
has been successful and is acknowledged by a status message from the
server (Listing 4).
Listing 3. SIP registration phase 3 (client to proxy) - re-authentication
REGISTER sip:sip.example.com SIP/2.0
Via: SIP/2.0/UDP 10.10.10.1:5060;rport;
branch=z9hG4bK913D93CF77A5425D9822FB1E47DF7792
From: Tobias Glemser <sip:123456@sip.example.com>;tag=1304509056
To: Tobias Glemser <sip:123456@sip.example.com>
Contact: "Tobias Glemser" <sip:123456@10.10.10.1:5060>
Call-ID: 2FB73E1760144FC0978876D9D69AE254@sipgate.de
CSeq: 20188 REGISTER
Expires: 1800
Authorization: Digest username="123456",realm="sip.example.com",
nonce="42b17a71cf370bb10e0e2b42dec314e65fd2c2c0",
response="bef6c7346eb181ad8b46949eba5c16b8",uri="sip:sip.example.com"
Max-Forwards: 70
User-Agent: X-Lite
Content-Length: 0
Listing 4. SIP registration phase 4 (proxy to client) - success
SIP/2.0 200 OK
Via: SIP/2.0/UDP 10.10.10.1:5060;rport=58949;
branch=z9hG4bK913D93CF77A5425D9822FB1E47DF7792
From: Tobias Glemser <sip:123456@sip.example.com>;tag=1304509056
To: Tobias Glemser <sip:1888819@sipgate.de>;
tag=b11cb9bb270104b49a99a995b8c68544.017a
Call-ID: 2FB73E1760144FC0978876D9D69AE254@sip.example.com
CSeq: 20188 REGISTER
Contact: <sip:123456@10.10.10.1:5060>;q=0.00;expires=1800
Server: sip.example.com ser
Content-Length: 0
The hash sent in step 3 has two features that prevent
fake authentication or the use of previously intercepted user data: it
is valid only for the random nonce value and includes the username and
password. This means that it is practically impossible for an attacker
to break the password and tap into communication in a realistic amount
of time.
DoS - Denial of Service
As with any other service, it is always possible to bring
down a VoIP service if you have enough bandwidth available. In case of
an SIP proxy, this could be done by using a register-storm attack to
overload the service. Implementation vulnerabilities can also make DoS
attacks against the service itself possible. It might even be possible
to gain access to the server using buffer overflow attacks - one such
vulnerability was discovered in 2003 in the open source Asterisk PBX
server (CAN-2003-0761). Exploiting flawed parameter processing with MESSAGE and INFO messages, an attacker could launch local commands in the context of the asterisk service, which is typically started by root.
SIP's susceptibility to going down due to invalid SIP
messages depends on the implementation - if a specific server has no
mechanisms for handling (or even just ignoring) invalid messages, it
might eventually go down. The Java-based PROTOS Test Suite is available
to test server behaviour, and any PBX (Private Branch Exchange) owner
would be well advised to run it against his box - see Frame On the Net).
A different type of DoS is user-supported DoS. Figure 4
shows a UDP message sent to an SIP phone with login 14 and IP
192.168.5.84 from the SIP-Proxy 192.168.5.25. By sending this message,
the proxy (or the attacker) signals that the user has new voice mail in
their inbox. You might notice this by having a look at the message body
and the Messages-Waiting: yes and Voice-Message: 1/0 entries. The same notification applies for example to fax messages. The first digit (1) indicates how many new messages are stored, while the second (0) shows the number of old messages.
Figure 4. A modified SIP packet
As you can see, we have edited this packet. This can easily be done using the Packetyzer
utility for Windows (see Frame On the Net), which is technically based
on Ethereal. Any packet can be edited, and incorrect checksums are also
shown and can be corrected. We can send our message to arbitrary
recipients - we also need the user's IP and login ID, which is usually
the same as their phone number. To illustrate that no further
information is necessary, we will fill all other fields with 0 values (such fields as User-Agent don't matter, of course).
Faking such a message shouldn't be problem - after all,
it doesn't contain any sensitive information, does it? Most phones (we
tested a Cisco 9750 and a Grandstream BT-100) process such messages
(even ones with incorrect checksums) and show them to the user.
Usually, a notification icon or the whole display starts to blink. The
user now calls their mailbox to listen to the non-existent new message.
Because there is no new message, the user might think this is just a
bug and ignore it. Shortly afterwards, the display starts blinking
again. Now our user is calling technical support, who will busily set
about locating the error (which could actually be quite amusing to look
at, considering that there is no error).
If an attacker starts sending such messages to all the
users in a network, both the users and the support staff will waste a
great deal of time trying to track down the error. Sending the message
to many users at once will also result in everyone calling their
mailbox, potentially leading to service congestion or even a server
breakdown.
Call interruption
Many papers report that sending a simple BYE
message to a call participant is enough to immediately terminate a
call. Well, it isn't quite that easy. First of all, as we already know,
the attacker has to know the call ID of the call dialogue. RFC 3261
says: The Call-ID header field acts as a unique identifier to group
together a series of messages. It MUST be the same for all requests and
responses sent by either UA [User-Agent] in a dialogue.
There is no strict rule that the call ID has to be
generated by hashing or has to be non-incremental, but most
implementations exhibit exactly this behaviour, using randomly chosen
call IDs. This means that in order to end the call using the call ID,
the attacker would need to sniff out the call initialization phase, and
if he's in a position to do so, then the content of the call would
presumably be of much more interest than the ability to simply end the
call.
Phreaking
Phreaking, or the fraud of telephony services,
traditionally accomplished by sending special system tones in public
call boxes, can well experience a revival. Due to the decoupling of
payload (RTP voice stream) and signalling (SIP), the phreaking scenario
outlined below seems pretty likely, though at present it is not yet
possible.
A prepared client sets up a new call to another prepared
client. Both connect via an SIP proxy and behave in a normal manner.
Directly after the call has been established, the proxy receives a
signal to end the call, which both clients acknowledge, but without
actually quitting the RTP streaming. The call has not ended, but the
SIP server doesn't notice it.
If both clients are located within the same subnet, the
call would not end in any case, as the voice stream is P2P. If there's
a breakout through the SIP proxy (for example if connecting to another
network), RTP communication is routed via the proxy, which now has to
end the RTP stream itself. The proxy would therefore have to recognize
that call termination has been signalled via SIP and transfer this
information directly to RTP communication control.
Another phreaking attack might also be possible,
depending on the SIP proxy implementation. Some implementations, like
the current version of Asterisk, require re-authentication using digest
authentication (as presented in Listings 1-4) for almost every single
client-server exchange. However, other implementations only require
re-authentication after a certain period of time, and the following
scenario demonstrates how this could be exploited to generate costs for
the provider.
An attacker sends a valid INVITE
message to the SIP proxy using the credentials of a successfully
authenticated user. The SIP proxy now initializes the call, and the
remaining packets required for successful call initialization can be
sent by the attacker after a specific time, without waiting for the
response packets from the server. Some special service number operators
charge enormous amounts for a call, regardless of call duration. Using
this scenario, an attacker could cause other users to be charged high
rates for short special service calls.
SPIT (SPam over IP Telephone)
SPIT is one of the most commonly mentioned dangers of
establishing VoIP services - attackers can send junk voice messages
just like e-mail spam. Unlike calls from robots in the world of
traditional telephony, VoIP calls don't generate initial costs. Like
spammers, a spitter uses the victim's address, except in this case it
is not their e-mail, but their SIP address. With the increasing
popularity of IP telephony, it's only a matter of time before spitters
will be able to easily obtain a great many valid SIP addresses,
especially if central address books are indeed going to be introduced.
The spitter calls an SIP number, the victim's SIP proxy
processes the call and the victim now has to listen to junk such as the
required minimum size of one's manhood. Just like a spammer, a spitter
needs just one thing - bandwidth. Voice messages require considerably
more resources than e-mails. Assuming a 15 second message (as few
victims could handle listening to more), one piece of spit would be 120
kB in size (if using a 64 kbps codec). The activity of trojan horses -
just as with spam once again - could cause any unprotected Internet
user to unwittingly send SPIT using their own bandwidth.
Diallers
A revival in the use of diallers, which were declared
dead when non-dial-up technologies like DSL and cable modems became
popular, may pose another threat. Because of the way an SIP client
connects, we have the same scenario as with ordinary diallers which use
modems or ISDN lines to call premium numbers. For example, a dialler
could infect an SIP client and install a certain number as the standard
call prefix or specify a new and very expensive SIP proxy. Calls would
then be made through these costly numbers unknown to the user - at
least until the first bill arrived.
No such diallers have yet been seen in the wild, but it's
probably just a matter of time before we hear the first stories of VoIP
dialler success.
Conclusion
There is no doubt that VoIP is one of the most thrilling
IT innovations of past few years and is set to become another
widespread use for the Internet and dominate both corporate and private
phone networks. Judging by the media attention given to VoIP security
problems, it might seem that the combination of SIP and RTP protocols
is a rather a feeble coupling. Whatever the truth, security problems
should always be carefully considered before migrating to a new
technology.
As this article has shown, numerous attack vectors have
been known for years - most are just slightly modified attacks on the
IP protocol. Successful attacks against SIP/RTP are typically possible
in LAN structures with unencrypted communications, for example by
sniffing RTP streams. This attack is absolutely no different to
sniffing data communications in TCP/IP. Most of the other attacks can
only be successful if the SIP proxy or the UAC (
User Agent Client)
don't process the call ID correctly or if the attacker sniffs out the
call ID. Security is also at risk if no digest authentication is
demanded for every single action which requires it. However, SPIT is
likely to be the biggest problem - when it comes to money, we can be
sure that no evil advertiser will hesitate to make use of the new medium.